Minimum message length inference of secondary structure from protein coordinate data
نویسندگان
چکیده
MOTIVATION Secondary structure underpins the folding pattern and architecture of most proteins. Accurate assignment of the secondary structure elements is therefore an important problem. Although many approximate solutions of the secondary structure assignment problem exist, the statement of the problem has resisted a consistent and mathematically rigorous definition. A variety of comparative studies have highlighted major disagreements in the way the available methods define and assign secondary structure to coordinate data. RESULTS We report a new method to infer secondary structure based on the Bayesian method of minimum message length inference. It treats assignments of secondary structure as hypotheses that explain the given coordinate data. The method seeks to maximize the joint probability of a hypothesis and the data. There is a natural null hypothesis and any assignment that cannot better it is unacceptable. We developed a program SST based on this approach and compared it with popular programs, such as DSSP and STRIDE among others. Our evaluation suggests that SST gives reliable assignments even on low-resolution structures. AVAILABILITY http://www.csse.monash.edu.au/~karun/sst.
منابع مشابه
Piecewise linear approximation of protein structures using the principle of minimum message length
UNLABELLED Simple and concise representations of protein-folding patterns provide powerful abstractions for visualizations, comparisons, classifications, searching and aligning structural data. Structures are often abstracted by replacing standard secondary structural features-that is, helices and strands of sheet-by vectors or linear segments. Relying solely on standard secondary structure may...
متن کاملIntroduction to Minimum Encoding Inference
This paper examines the minimumencoding approaches to inference, Minimum Message Length (MML) and Minimum Description Length (MDL). This paper was written with the objective of providing an introduction to this area for statisticians. We describe coding techniques for data, and examine how these techniques can be applied to perform inference and model selection.
متن کاملIntrinsic Classification of Spatially Correlated Data
Intrinsic classification, or unsupervised learning of a classification, was the earliest application of what is now termed minimum message length (MML) or minimum description length (MDL) inference. The MML algorithm ‘Snob’ and its relatives have been used successfully in many domains. These algorithms treat the ‘things’ to be classified as independent random selections from an unknown populati...
متن کاملMinimum Message Length based Mixture Modelling using Bivariate von Mises Distributions with Applications to Bioinformatics
The modelling of empirically observed data is commonly done using mixtures of probability distributions. In order to model angular data, directional probability distributions such as the bivariate von Mises (BVM) is typically used. The critical task involved in mixture modelling is to determine the optimal number of component probability distributions. We employ the Bayesian information-theoret...
متن کاملA New Message Length Approximation for Parameter Estimation and Model Selection
This paper examines Bayesian two-part coding schemes as tools for parameter estimation and model selection. The Wallace–Freeman message length approximation to strict minimum message length can be used to obtain two-part message lengths. However, this approximation relies on some strong assumptions regarding the likelihood function and prior distribution which do not hold for a large range of m...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 28 شماره
صفحات -
تاریخ انتشار 2012